Sampling-based approximate skyline calculation on big data

نویسندگان

چکیده

Nowadays, big data is coming to the force in a lot of applications. Processing skyline query on more than linear time by far too expensive and often even may be slow. It obviously not possible compute an exact solution sublinear time, since itself have size. Fortunately, many situations, fast approximate useful slower solution. This paper proposes two sampling-based algorithms for processing queries. The first algorithm obtains fixed size sample computes it. error only relatively small most cases, but also almost unaffected input second returns [Formula: see text]-approximation efficiently. running has nothing do with practical, achieving goal sublinearity data. Experiments verify analysis algorithm, show that much faster existing algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sampling Based Range Partition Methods for Big Data Analytics

Big Data Analytics requires partitioning datasets into thousands of partitions according to a specific set of keys so that different machines can process different partitions in parallel. Range partition is one of the ways to partition the data that is needed whenever global ordering is required. It partitions the data according to a pre-defined set of exclusive and continuous ranges that cover...

متن کامل

Skyline Computation on Commercial Data

• Our data set contains data on 55208 cars [1]. • To each car, 23 attributes are assigned. – correlated (e.g., cylinders and engine size). – anti-correlated (e.g., mileage and registration date). – nearly independent (e.g., mileage and horsepower). • Outliers countervail correlation effects. • Cardinalities differ greatly, e.g.: – 5988 different values for attribute price. – only 17 different v...

متن کامل

Error-bounded Sampling for Analytics on Big Sparse Data

Aggregation queries are at the core of business intelligence and data analytics. In the big data era, many scalable sharednothing systems have been developed to process aggregation queries over massive amount of data. Microsoft’s SCOPE is a well-known instance in this category. Nevertheless, aggregation queries are still expensive, because query processing needs to consume the entire data set, ...

متن کامل

Data Interpolation: An Efficient Sampling Alternative for Big Data Aggregation

Given a large set of measurement sensor data, in order to identify a simple function that captures the essence of the data gathered by the sensors, we suggest representing the data by (spatial) functions, in particular by polynomials. Given a (sampled) set of values, we interpolate the datapoints to define a polynomial that would represent the data. The interpolation is challenging, since in pr...

متن کامل

Importance Sampling Algorithms for Belief Networks based on Approximate Computation

In this paper we study a new general class of algorithms for the propagation of probabilities on graphical structures based on importance sampling techniques. The idea is to make an approximate and fast propagation in order to obtain a sampling distribution as close as possible to the true one. Our proposal is based on a deletion sequence of the variables to calculate the 'a posteriori' probabi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Discrete Mathematics, Algorithms and Applications

سال: 2021

ISSN: ['1793-8309', '1793-8317']

DOI: https://doi.org/10.1142/s1793830922500240